Grammar Acquisition and Statistical Parsing by exploiting Local Contextual Information

نویسنده

  • Manabu Okumura
چکیده

This paper presents a method for inducing a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus using local contextual information and describes a natural language parsing model which uses a probabilitybased scoring function of the grammar to rank parses of a sentence. This method uses clustering techniques to group brackets in a corpus into a number of similar bracket groups based on their local contextual information. From the set of these groups, the corpus is automatically labeled with some nonterminal labels, and consequently a grammar with conditional probabilities is acquired. Based on these conditional probabilities, the statistical parsing model provides a framework for nding the most likely parse of a sentence. A number of experiments are made using EDR corpus and Wall Street Journal corpus. The results show that our approach achieves a relatively high accuracy: 88 % recall, 72 % precision and 0.7 crossing brackets per sentence for sentences shorter than 10 words, and 71 % recall, 51 % precision and 3.4 crossing brackets for sentences between 10-19 words. This result supports the assumption that local contextual statistics obtained from an unlabeled bracketed corpus are e ective for learning a useful grammar and parsing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

KEY WORDS-Statistical Parsing, Grammar Acquisition, Clustering Analysis, Local Contextual

This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...

متن کامل

Grammar Acquisition Based on Clustering Analysis and Its Application to Statistical Parsing

This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...

متن کامل

Exploiting Contextual Information In Hypothesis Selection For Grammar Refinement

In this paper, we propose a new framework of grammar development and some techniques for exploiting contextual information in a process of grammar refinement. The proposed framework involves two processes, partial grammar acquisition and grammar refinement. In the former process, a rough grammar is constructed from a bracketed corpus. The grammar is later refined by the latter process where a c...

متن کامل

Statistical Parsing with a Grammar Acquired from a Bracketed Corpus Based on Clustering Analysis

This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...

متن کامل

Hard Constraints for Grammatical Function Labelling

For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local information for function labelling and fail to capture important restrictions on the distribution of core argument functions such as subject, object ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998